Remarque: En raison du nombre important de graphiques, seule une minorité des résultats concernant l’analyse ont intégrés dans ce rapport HTML. Tous les résultats et visualisations ont été exportés dans le dossier PCA_results/ afin d’alléger le fichier final et de préserver la lisibilité.
To analyze voting behavior patterns across different types of elections in Paris since 2000, I designed a data pipeline that is structured, reproducible, and compatible with diverse data formats.
Although the Paris open data portal provides an API and structured endpoints, I explicitly chose to rely on cleaned Excel files as the core data source, due to the incompleteness and inconsistency of API-provided datasets across election years and election types. Our approach guarantees that data access is aligned with available official documents and avoids patchiness in long-term comparative analyses.
Municipales/2020-01)..xls or .xlsx format, downloaded directly
from the Paris open data portal.PrésidentiellesLégislativesRégionalesEuropéennesMunicipalesstr_match or
str_extract) to retrieve year and round information from
file names..xlsx files in structured subfolders under
*_processed/.Régionales_processed/vote_matrix_2015_2eme.xlsxLégislatives_processed/2022-01/vote_matrix_Circ_11.xlsxMunicipales_processed/2008-02/vote_matrix_Ardt_19.xlsxI emphasize again that while a fully API-based pipeline was conceptually desirable, data gaps and format heterogeneity in live endpoints made Excel-based pipelines a more reliable and uniform choice for this project.
The diversity of sources and formats requires a robust cleaning pipeline to ensure comparability and quality across datasets. Our data cleaning logic includes:
janitor::clean_names() to ensure consistent naming across
datasets (e.g., ID_BVOTE → id_bvote)id_bvote is always character)NAs in vote counts are converted to
0 via
mutate(across(..., replace_na(...)))id_bvote (e.g.,
"18-3" → 18)Nord-Est, Sud-Ouest, etc.) based on
arrondissement codeTo avoid inconsistencies in vote category names (e.g.,
"NB_BL" vs "NB_BL_NUL"), I excluded all vote
count columns prefixed with nb_ and focused only on
expressed vote columns (i.e., actual votes per
candidate or list).
A future extension of this pipeline would involve mapping candidate
names to standardized party labels (e.g.,
"Jean-Luc MÉLENCHON" → "LFI"), particularly
for cross-election comparisons. While not mandatory for matrix
decomposition, this becomes relevant in PCA/CCA extensions discussed in
Section III.
In summary, our extraction and cleaning process was designed to be robust to filename irregularities, structurally scalable across elections, and ultimately Excel-centric to ensure maximal compatibility with real-world data completeness and archival formats.
I conducted principal component analysis (PCA) on the first round of the 2017 and 2022 presidential elections in Paris to assess the structure and stability of voter preferences. These two elections featured the same leading candidates—Emmanuel Macron and Marine Le Pen—providing a natural comparison. In both years, the first principal component (PC1) explained over one-third of the variance, and the variable plots consistently revealed a dominant ideological axis opposing Macron (center) and Le Pen (far-right). Despite the emergence of new candidates in 2022, the PCA results showed a remarkably stable structure, suggesting that the underlying political space and polarization patterns in Paris remained largely unchanged across these two electoral cycles.
## [1] "./Présidentielles_processed/vote_matrix_2017_1er.xlsx"
## [2] "./Présidentielles_processed/vote_matrix_2022_1er.xlsx"
I performed PCA on the legislative election data at the constituency level to assess whether the vote distributions exhibit clear groupings or extreme outliers across polling stations. In most constituencies, the first two principal components explain a modest proportion of the variance (typically 25–30%), and the resulting individual plots show relatively compact clusters without pronounced extremities. While some constituencies display mild elongation along specific axes, indicating potential latent polarization, I do not observe distinct subgroups or isolated voting stations. This suggests that legislative voting behavior in Paris tends to be moderately structured, but lacks strong clustering or extreme anomalies.
Principal component analysis (PCA) applied to municipal elections reveals a relatively noisy structure. The first principal component typically explains around 30–40% of the total variance, indicating the absence of a dominant political axis. The individual factor maps show widely scattered points, suggesting that voting patterns vary greatly across polling stations without forming clear ideological clusters. Furthermore, preliminary checks suggest that voter turnout may drive part of the principal components, as high/low participation areas tend to align along the same direction in the PCA space. This supports the idea that local context and mobilization, rather than stable partisan alignments, play a larger role in municipal elections in Paris.
In the European Parliament elections, principal component analysis (PCA) reveals that the first principal component (PC1) consistently explains only around 18% to 22% of the total variance, indicating a relatively diffuse and multidimensional voting structure. Despite the limited explanatory power of PC1, the variable plots suggest a latent opposition between candidates associated with more extreme or nationalist platforms (e.g., bardella_jordan, maréchal_marion) and those aligned with mainstream center-left parties (e.g., glucksmann_raphael, toussaint_marie), often located on opposite ends of the axis. This suggests that PC1 may still capture a “mainstream vs. radical” ideological divide, although less sharply than in other elections. Overall, the PCA highlights the fragmented and complex nature of the European election space in Paris, where no single dimension fully dominates the political landscape.
The PCA results for the regional elections reveal a consistent regional structure in voter behavior. The first principal component explains a meaningful share of the variance across years and appears to align with a political-ideological gradient separating different types of candidates. Notably, Valérie Pécresse and Nicolas Dupont-Aignan often occupy one end of the first axis, while candidates like Pierre Laurent, Olivier Besancenot, or Julien Bayou appear on the opposite end. This suggests that the primary dimension may capture a left–right or establishment–anti-establishment divide. Moreover, when coloring individuals by region_group, distinct clustering emerges in some years, indicating the presence of territorial voting patterns, such as a contrast between Centre/North-East and South-West areas. These results confirm that regional polarization and candidate ideology jointly shape electoral variation in Paris during the regional contests.
To evaluate the structural similarity between voting behaviors in the 2022 presidential election and the 2024 European election, I conducted a canonical correlation analysis (CCA) at the polling station level.
The analysis yielded a series of 14 canonical correlations, with the first few axes showing very strong correlations:
These values indicate that a substantial portion of the variance in voting results from one election can be linearly predicted from the other, suggesting a high degree of ideological and behavioral alignment across the two electoral contexts.
Beyond the first few axes, the canonical correlations gradually decline (e.g., Axis 6: 0.536, Axis 10: 0.257), which is expected as deeper dimensions capture more election-specific noise or candidate-specific idiosyncrasies.
This analysis confirms that voters tend to exhibit consistent political orientations across national (présidentielles) and European (européennes) elections, at least in the first few dominant ideological dimensions.
## [1] 0.993
## [1] 0.968
## [1] 0.911
## [1] 0.846
## [1] 0.74
## [1] 0.536
## [1] 0.483
## [1] 0.399
## [1] 0.34
## [1] 0.257
## [1] 0.241
## [1] 0.18
## [1] 0.149
## [1] 0.111
Voting Shift Visualization (by Distance):
This arrow plot illustrates the shifts in voting preferences between the 2017 and 2022 French presidential elections for each polling station, represented in PCA space. Each arrow connects a point’s position in the 2017 PCA configuration to its position in 2022. The color gradient encodes the Euclidean shift distance: darker arrows indicate small shifts, while yellow highlights larger preference changes. Most arrows are short and cluster around the origin, implying that for a majority of polling stations, electoral preferences remained relatively stable. However, a few long arrows in the upper-right quadrant signal dramatic shifts—possibly due to candidate turnover, voter realignment, or local political mobilizations.
Voting Shift by Region Group:
This plot disaggregates the same shift vectors by region group. Color coding shows that different geographic zones exhibit different voting dynamics. For instance, arrows from the Nord-Est and Nord-Ouest regions appear more dispersed, suggesting greater heterogeneity or volatility in those areas. In contrast, Sud-Ouest and Centre regions show denser, shorter shifts, indicating more stable patterns. This spatial structuring suggests that regional political culture or local socioeconomic factors may mediate how national political changes translate into local voting behavior.
The hierarchical clustering based on PCA results reveals four distinct clusters of voting behavior across polling stations. Cluster 1 (green) is concentrated on the left side of the PCA space, indicating a group of stations with similar preference patterns that are relatively distinct from the others. Cluster 4 (pink), by contrast, is more spread out on the right side, possibly reflecting more diverse or polarized voting tendencies. Clusters 2 and 3 occupy intermediate positions and partially overlap, suggesting transitional or mixed voting profiles.